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Abstract 

Multivariate time series data arise in many applied domains, and it is often crucial 
to obtain a good characterization of how the covariance among the different variables 
changes over time. Certainly this is the case in financial applications in which co- 
variance can change dramatically during times of financial crisis, revealing different 
associations among assets and countries than occur in a healthier economic climate. 
Our focus is on developing models that allow the covariance to vary flexibly over con- 
tinuous time, and additionally accommodate locally adaptive smoothing of the covari- 
ance. Locally adaptive smoothing to accommodate varying smoothness in a trajectory 
over time has been well studied, but such approaches have not yet been developed for 
time-varying covariance matrices to our knowledge. To address this gap, we generalize 
recently develop methods for Bayesian covariance regression to incorporate random 
dictionary elements with locally varying smoothness. Using a differential equation rep- 
resentation, we additionally develop a fast computational approach via MCMC, with 
online algorithms also considered. The performance of the models is assessed through 
simulation studies and the methods are applied to financial time series. 

Key Words: Bayesian nonparametrics; locally varying smoothness; long-range dependence; 
multivariate time series; nested Gaussian process; stochastic volatility. 

1. Introduction 

In the field of multivariate time series, modeling the time-varying covariance structure in 
high dimensional datasets is a key issue in many application domains. For example, in our 
motivating application, the focus is on capturing volatility and co-volatilities of different 
assets and countries. There are many other applications that have arisen with the routine 
collection of online data, such as Google Trends. When the dynamics of the series are 




Figure 1: Squared Log-Returns of DAX30, using weekly data from 2004/07/19, to 2012/06/25. 

characterized by dramatic changes, it is important to incorporate locally adaptive smoothing 
of not only the mean but also the covariance. Figure [T] shows an example from financial 
time series, where volatilities, described by the squared log-returns of the German stock 
market index (DAX30), change dramatically, motivating the search for flexible models able 
to capture such rapid changes. 

There is a rich literature on univariate stochastic volatility modeling, with an increasing 
emphasis on multivariate generalizations. One popular approach estimates the p x p time- 
varying covariance matrix St via an exponentially weighted moving average (EWMA; see, 
e.g., Tsay, 2005). This approach uses a single time-constant smoothing parameter < A < 
1, with extensions to accommodate locally- varying smoothness X t not straightforward due 
to the need to maintain positive semidefinite St at every time. To address this problem, 
generalizations of EWMA have been proposed including the diagonal vector ARCH model 
(DVEC), (Bollerslev, Engle and Wooldridge, 1988) and its variant, the BEKK model (Engle 
and Kroner, 1995). However, these models are computationally demanding and are not 
designed for moderate to large p. Principal component GARCH (Ding, 1994) and O-GARCH 
(Alexander, 2001) perform dimensionality reduction through a latent factor model (see also 
van der Weide, 2002). However, time-constant factor loadings and uncorrelated latent factors 
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restrict evolution of E t . 

Such models fall far short of our goal of allowing E t to be fully flexible with the depen- 
dence between E t and S t+ A varying with not just the time-lag A but also time. In addition, 
these models do not handle missing data easily and tend to require long series for accurate 
estimation (Burns, 2005). Accommodating changes in continuous time are also important in 
many applications and to avoid having the model be critically dependent on the time scale, 
with inconsistent models obtained as time units are varied. 

An alternative to multivariate GARCH is to characterize the volatility process as a ran- 
dom function of past returns (Harvey, 1994). Missing values can be handled but most of the 
literature limits flexibility in evolution of E t by using models that characterize time changes 
solely through variances. To address this problem, Philipov and Glickman (2006a) develop 
a multivariate stochastic volatility model in which the time- varying covariance structure fol- 
lows a stochastic process based on the Wishart distribution. Computational challenges arise 
in implementations, and the use of the Wishart limits generalizations of moderate to large p 
settings. Prado and West (2010) address the problem of posterior computation for dynamic 
covariance matrices via discounting methods that maintain simple update equations as new 
observations are added. However, the model restricts the evolution of the covariance to be 
stationary and slowly-changing. 

Even for iid observations from a multivariate normal model with a single time stationary 
covariance matrix, there are well known problems with Wishart priors motivating a rich 
literature on dimensionality reduction techniques based on factor and graphical models. 
There has been abundant recent interest in applying such approaches to dynamic settings. 
Refer to Nakajima and West (2012) and the references cited therein for recent literature on 
Bayesian dynamic factor models for multivariate stochastic volatility. Their approach allows 
the factor loadings to evolve dynamically over time, while including sparsity through a latent 
thresholding approach, leading to apparently improved performance in portfolio allocation. 
They utilize a time-varying discrete-time autoregressive model, which allows the dependence 
in the covariance matrices E t and E t+ A to vary as a function of both t and A. However, 
the result is an extremely richly parameterized and computationally challenging model, with 
selection of the number of factors proceeding by cross validation. Our emphasis is instead on 
developing continuous time stochastic processes for E t , which accommodate locally- varying 
smoothness. 



3 



In this regard, a highly relevant development was the Bayesian covariance regression 
(BCR) model of Fox and Dunson (2011), which defines the covariance matrix as a regular- 
ized quadratic function of time-varying loadings in a latent factor model, characterizing the 
latter as a sparse combination of a collection of unknown Gaussian process (GP) dictionary 
functions. Although their approach provides a continuous time and highly flexible model 
that accommodates missing data and scales to large p, there are two limitations motivat- 
ing this article. Firstly, their proposed covariance stochastic process assumes a stationary 
dependence structure, and hence tends to under-smooth during periods of stability and 
over-smooth during periods of dramatic change. Secondly, the well known computational 
problems with usual GP regression are inherited, leading to difficulties in scaling to long 
series and issues in mixing of MCMC algorithms for posterior computation. 

In this article, we address both of these problems to develop a novel covariance stochastic 
process with locally-varying smoothness. This is accomplished by modifying the method of 
Fox and Dunson (2011) to incorporate dictionary functions that are assigned nested Gaussian 
process (nGP) priors (Zhu and Dunson, 2012). Such nGP priors reduce the GP computa- 
tional burden involving matrix inversions from 0(T 3 ) to 0(T), with T denoting the length 
of the time series, while also allowing flexible locally-varying smoothness. Marginalizing 
out the dictionary functions, we obtain a covariance stochastic process that inherits these 
advantages. We also propose an online implementation, which can accommodate streaming 
data. 

In Section 2, we describe the basic model structure with particular attention to prior 
specification. Section 3 explores the main features of the Gibbs sampler for posterior compu- 
tation and outlines the steps for a fast online updating approach. In Section 4 we compare 
our model to BCR through a simulation study. Finally in Section 5 an application to stock 
market indices across countries is examined. 

2. Covariance Regression with Locally Adaptive Smoothing Priors 

2.1 Notation and Motivation 

Let yi represent a p x 1 vector of observations where % = 1, T indexes time with 

Di ~ N p (n(ti),T±(ti)), 

with fi(ti) and denoting, respectively, the p x 1 mean vector and p x p covariance 
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matrix at time U G T C 9? + , where ti represents the time of the ith observation and ob- 
servations can be unequally-spaced. Following a Bayesian approach, we define priors for 
£7- = {£(£), t G T} and fij- = {fi(t),t G T}, which are designed to be flexible, accommodate 
locally- varying smoothness in time, be defined in continuous time, and facilitate scaling both 
computationally and statistically to moderately large p. 

2.2 Latent Factor Model 

A common strategy in modeling of large covariance matrices is to rely on a lower-dimensional 
factorization, with factor analysis providing one possible direction. Sparse Bayesian factor 
models have been particularly successful in challenging cases, while having advantages over 
frequentist competitors in incorporating a probabilistic characterization of uncertainty in 
the number of factors as well as the parameters in the loadings and residual covariance. For 
recent articles on Bayesian sparse factor analysis for a single large covariance matrix, refer 
to Bhattacharya and Dunson (2011), Pati et al. (2012) and the references cited there-in. 

In our setting, we are instead interested in letting the covariance matrix vary flexibly 
over time. Extending the usual factor analysis framework to this setting, we let 



where A(t) is a p x K time-varying factor loadings matrix and So = diag(cr^, cr^). There is 
a literature on using Bayesian factor analysis with time-varying loadings, but essentially all 
the literature assumes discrete-time dynamics on the loadings while our focus is instead of 
allowing the loadings, and hence the induced covariance £(t), to evolve flexibly in continuous 
time. Hence, we are most closely related to the literature on Gaussian process latent factor 
models for spatial and temporal data; refer, for example, to Lopes, Salazar and Gamerman 
(2008) and Lopes, Gamerman and Salazar (2011). In these models, the factor loadings matrix 
characterizes spatial dependence, with time varying factors accounting for dynamic changes. 
Fox and Dunson (2011) instead allow the loadings matrix to vary through a continuous time 
stochastic process built from latent GP dictionary functions. The basic structure of our 
model is identical to theirs, but we propose a fundamentally different prior on the dictionary 
elements and approach to computation. 

Model Q can be induced by marginalizing out the latent factors r/i in 



£(t) = A(t)A(t) T + S. 



yi = A(ti)rji + a 
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where rji ~ Nk(0, Ik) and q ~ N p (0, S ). We assume that the time-varying factor loadings 
matrix A(t) is a linear combination of a much smaller set of continuous dictionary functions 
£ik '■ T — > 3? comprising the L x K, with L « p, matrix As a result 

where B is a px L matrix of coefficients relating the matrix A(t) to the time- varying dictionary 
elements in £(£). Such a decomposition further reduces the number of continuous random 
functions to be modeled from p x K to L x K leading to a more computationally tractable 
formulation in which the induced covariance structure, after marginalizing out the latent 
factors, follows the equation 

cov{ yi \U = t) = E(t) = 0£(i)£(t) T e T + So (2) 

The above decomposition of £(i) is not unique. Potentially we could constrain the loadings 
matrix to enforce identifiability (Geweke and Zhou, 1996), but this approach induces an 
undesirable order dependence among the responses (Aguilar and West, 2000, West, 2003, 
Lopes and West, 2004, Carvalho et a!., 2008). Given our focus on estimation of £(£), we 
follow Ghosh and Dunson (2009) in avoiding identifiability constraints, as such constraints are 
not necessary to ensure identifiability of the induced covariance £(t). The characterization 
of the class of time- varying covariance matrices £(£) is proved by Lemma 2.1 of Fox and 
Dunson (2011) which states that for K and L sufficiently large, any covariance regression 
can be decomposed as in 

When n(t) is not know, we can incorporate nonparametric mean regression in the model 
formulation by letting 

T)i = i>(ti) N K (0, I K ), (3) 

where ip(ti) = [ipi(ti), ipK(ti)] T is a vector of continuous functions ip j : T — > 3ft that can be 
modeled similarly to the dictionary functions As a result, marginalizing out the latent 
factors, the induced mean of yi conditionally on ti = t is 

ti(t) = emm (4) 

2.3 Prior Specification 

We specify independent priors H^, n , Il So and IL^, respectively for £7- = {£(£), t G T}, 0, 
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S and ij}f = {ip(t),t G T} in ^ and Q, to induce priors lis and 11^ on E7- and nq- with 
the goal of maintaining simple computation and allowing both covariances and means to 
vary flexibly over continuous time. Fox and Dunson (2011) address this issue by considering 
dictionary function as £^ ~ GP(0, c) independently for all I, k, with c the squared exponential 
correlation function having c(£,£') = exp ( — — •C'lli)- To allow locally-adaptive smoothing 
and facilitate computation, we induce a prior on the dictionary functions through stochastic 
differential equations. 

Specifically, following the nested Gaussian process (nGP) formulation of Zhu and Dunson 
(2012), we let 

D m £ lk (t) = A lk (t)+at lk Wt lk (t), meN, m > 2, (5) 
D n A lk {t) = a Alk W Alk (t), neN, n > 1, (6) 

where Ai k is the locally instantaneous mean, a^ lk G 3ft + , a Alk G 3ft + , W^ lk : T — > 3ft and W Alk '■ 
T — > 3ft are independent Gaussian white noise processes with mean E[W^ fc (t)] = E[M^4 ifc (t)] = 
0, for all t G T, and covariance function E[W^ k (t)W ilk (t')] = E[W Alk (t)W Alk (t')] = 5{t-t') a 
delta function. This formulation naturally induces a prior for with varying smoothness, 
where E[D m ^ik(t)\Aik(t)} = Aik(t), and initialization at ti based on the assumption 

D 1 ^), D— 1 e^(ti)] T ~ N m (0, alJ m ). 

The same goes for the initial value Aik(ti) and its derivatives up to order n — 1 leading 
to the prior 

[Attfa), -D 1 ^4/fc(ti), D n - 1 4(*i)f ~ iVn(0, < fe /„). 

The prior for the initial values and for W^ lk and W Alk are assumed mutually independent. 
Finally, we let 

a 1ik ~ invGa(a^,b^) 
a\ ih ~ invGa(a A , b A ) 

independently for each (l,k); where invGa(a,b) denotes the Inverse Gamma distribution 
with shape a and scale b. 

The Markovian property implied by SDEs in (5) and (6) represents a key advantage 
in terms of computational tractability as it allows a simple state space formulation. In 
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particular, referring to Zhu and Dunson (2012) for m = 2 and n — 1 (this can be easily 
extended for higher m and n), and for 5i = t i+i — ti sufficiently small, the prior for £ lk along 
with its first order derivative £' lk and A ik follow the approximated state equation 
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where [u iAk , Ui,A lk \ T ~ N 2 (0,V^i k ), with V^ Xk = diag(cr| ifc 5 h a\ k 8i). There are two crucial 
aspects to highlight. Firstly, this formulation allows continuous time and an irregular grid 
of observations over t by relating the latent states at % + 1 to those at % through the distance 
between and ti where % represents a discrete order index and ti e T the time observation 
related to the ith observation. Secondly, compared to Zhu and Dunson (2012) our approach 
represents an important generalization in: (i) extending the analysis to the multivariate case 
(i.e. yi is a p-dimensional vector instead of a scalar) and (ii) accommodating locally adap- 
tive smoothing not only on the mean but also on the time-varying variance and covariance 
functions. 

To address the issue related to the selection of the number of dictionary elements a 
shrinkage prior lie is proposed for 0. In particular, as proposed in Bhattacharya and 
Dunson (2011) we assume: 

Bj^ju n ~ N(0, ^Vf 1 ) 0,7 ~ Ga(3/2, 3/2) 

i 

A^Ga( ai ,l), # h ~Ga(a 2 ,l),h>2, n = JJ # h (8) 

h=l 

Note that if a 2 > 1 the expected value for -&h is greater than 1. As a result, as I goes to 
infinity, T\ tends towards infinity shrinking 9ji towards zero. This leads to a flexible prior for 
9ji with a local shrinkage parameter (fiji and a global column-wise shrinkage factor r ; which 
allows many elements of being close to zero as L increases. 

Finally for the variances of the error terms in vector e«, we assume the usual inverse 
gamma prior distribution. Specifically n So is defined through 

<7~ 2 ~ Ga(a a , b a ) 

independently for each j = l,...,p. To conclude prior specification, if /i(t) is unknown, a 
similar approach based on dictionary elements can be considered in the definition of the 



8 



prior n^. In particular, recalling the previous results of the prior for we can represent 
the prior for ip k with the following state equation 

4>k{U+i) 

^fc(*i+i) 
_ B k (t i+ i) 

independently for k = 1, where [ui^ k ,u iyBk ] T ~ N 2 (0,S iik ), with = diag(cr^^, a| fc ^). 

Similarly to the priors for the initial values are assumed 

[Mti),D 1 Mti),---,D m - 1 Mti)] T ~ iV m (0,</ m ), 

[^(to,^ 1 ^^),...,^- 1 ^^^ ~ iv n (o,</ n ), 

while those for the variances in the state equation follow 

a\ k ~ invGa(a^,,b^,), 
o\ k ~ invGa(a,B,bB)- 

2.4 Hyperparameter interpretation 

We now focus our attention on the hyperparameters of the priors for ct| , , a| fe and 
crf^. Several simulation studies have shown that the higher the variances in the latent state 
equations, the better our formulation accommodates locally adaptive smoothing for sudden 
changes in covariances and means. A theoretical support for this data-driven consideration 
can be identified in the connection between the nGP prior and nested smoothing splines. It 
has been shown by Zhu and Dunson (2012) that the posterior mean of U with reference to 
the problem of nonparametric mean regression under the nGP prior can be related to the 
minimizer of the equation 

rp 

- J>i - U(t t )) 2 + \u f (D m U(t) - C{t)fdt + X C I (D n C(t)) 2 dt, 

where C is the locally instantaneous function and Xu e and Ac G 3? + regulate the 
smoothness of unknown functions U and C respectively, leading to less smoothed patterns 
when fixed at low values. The resulting inverse relationship between these smoothing pa- 
rameters and the variances in the state equation, together with the results in the simulation 
studies, suggest to fix the hyperparameters in the Inverse Gamma prior for cr| !fe , a\ , <r 2 k and 
a 2 Bk so as to allow high variances in the case in which the time series analyzed are expected 
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to have strong changes in their covariance (or mean) dynamic. In practical applications, it 
may be useful to obtain a first estimate of the covariance matrix E(t) and the mean vector 
jx{t) to set the hyperparameters. More specifically, jij(ti) can be the output of a standard 
moving average on each time series yj = [yjx, yjr], while can be obtained by a simple 
estimator, such as the EWMA procedure. With these choices, the recursive equation 

£(ti) = (1 - X){[y l -i - /2(t^0][yi_i - /i(^-i)f } + AE(*i_i) 
become easy to implement. 

3. Posterior Computation 

The algorithm for posterior computation alternates between a simple and efficient simulation 
smoother step (Durbin and Koopman, 2002) to update the state space formulation of the 
nGP, and standard Gibbs sampling steps for updating the parametric component parameters 
from their conditional distributions. When also the mean process needs to be estimated, an 
additional step with a block sampling for {/0(^»)H=i anc ^ { u i}f=i * s implemented. 

3.1 Gibbs Sampling 

We outline here the main features of the algorithm for posterior computation based on 
observations (y*, U) for i = 1, ...,T, while the complete algorithm is provided in the Appendix. 

A. Given and {r]i}J =1 , a multivariate version of the MCMC algorithm proposed by Zhu 

and Dunson (2012) draws posterior samples from each dictionary element's function 
{£,ik{ti) }f=i , its first order derivative {£ik(ti)}T=i, the corresponding instantaneous mean 
{Aik(ti)}J =1 , the variances in the state equations <r| ;fc , a\ ik and the variances of the error 
terms in the observation equation cr| with j = 1, ...,p. 

B. If the mean process needs not to be estimated, recalling the prior rji ~ N K * (0,I K *) 

and model ([T|, the standard conjugate posterior distribution from which to sample the 
vector of latent factors for each i given 6, {aj 2 }? =1 , {yi}f =1 and {^(U)}f =1 is Gaussian. 
Otherwise, if we want to incorporate the mean regression, through model Q, we 
implement a block sampling of {ip(ti)}J =1 and {vi}J =1 following a similar approach 
used for drawing samples from the dictionary elements process. 

C. Finally, conditioned on {yi}f =1 , {r]i}J =1 , {<J~ 2 } P j=i an d {£{ti)}J=i, and recalling the shrink- 
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age prior for the elements of G in (J8J) , we update G, each local shrinkage hyperparameter 
4>ji and the global shrinkage hyperparameters ti following the standard conjugate anal- 
ysis. 

3.2 Online Updating 

The problem of online updating represents a key point in multivariate time series with high 
frequency data. Referring to our formulation, we are interested in updating an approximated 
posterior distribution for £(£t+/i) and fi(tT+h) with h = 1, ...,H once a new vector of obser- 
vation {yi}J=j>+i is available, instead of rerunning posterior computation for the whole time 
series. 

Using the posterior estimates of the Gibbs sampler based on observations available up to 
time T, {yi}f = i, it is easy to implement (see in Appendix) a highly computationally tractable 
online updating algorithm which alternates between steps A and B outlined in the previous 
section for the new set of observations, and that can be initialized at T + 1 using the one 
step ahead predictive distribution for the latent state vector in the state space formulation. 

Note that the initialization procedure for latent state vectors in the algorithm depends 
on the sample moments of the posterior distribution for the latent states at T. As it is known 
for Kalman smoothers (see, e.g., Durbin and Koopman 2001), this could lead to computa- 
tional problems in the online updating due to the larger conditional variances of the latent 
states at the end of the sample (i.e., at T). To overcome this problem, we replace the previ- 
ous assumptions for the initial values with a data-driven initialization scheme. In particular, 
instead of using only the new observations for the online updating, we run the algorithm 
for {y i }JJ'^ I _ k , with k small, and choosing a diffuse but proper prior for the initial states at 
T — k. As a result the distribution of the smoothed states at T is not anymore affected by 
the problem of large conditional variances leading to better online updating performance. 

4. Simulation Studies 

The aim of the following simulation study is to assess whether and to what extent the 
proposed model can accommodate, in practice, even dramatic changes in the time-varying 
covariances and to compare the performance of our proposal (LBCR, local Bayesian co- 
variance regression) with respect to BCR proposed by Fox and Dunson (2011). In the last 
subsection we also analyze the performance of the proposed online updating algorithm. 
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4.1 Estimation Performance 

We generate a set of 5- dimensional observations for each ti in the discrete set T = 
{1, 2, 100}, from the latent factor model in Q with A(£j) = 0£(£i) and i]i defined as in 
(j3|). To allow dramatic changes of the covariances in the generating mechanism, we consider 
a 2 x 2 (i.e. K = L = 2) matrix of time- varying functions adapted from Donoho 

and Johnstone (1994) with locally-varying smoothness (more specifically we choose 'bumps' 
functions). The latent mean dictionary elements {/0(^i)}i=i are simulated from a Gaussian 
process GP(0, c) with length scale k = 10, while the elements in matrix can be obtained 
from the shrinkage prior in (j8J) with ai = a 2 = 10. Finally the elements of the diagonal 
matrix Sq 1 are sampled independently from Ga(l, 0.1). 

Posterior computation for our proposed approach is performed by using truncation levels 
K* = L* =2; placing a Ga( 1,0.1) prior on the precision parameters a~ 2 and choosing 
a \ — a 2 — 2. As regards the nGP prior for each dictionary element with I = 1,...,L* 
and k = 1, iT*, we choose diffuse but proper priors for the initial values by setting a^ [k = 
a\ ik = 100 and place an invGa(2, 5 x 10 8 ) prior on each er| and er^L in order to allow 
less smoothed behavior according to a previous graphical analysis of £(tj) estimated via 
EWMA. Similarly we set <r^ fc = cr^, fe = 100 in the prior for the initial values of the latent 
state equations resulting from the nGP prior for ipk, and consider = Ob = = bs = 0.005 
to balance the rough behavior induced on the nonparametric mean functions by the settings 
of the nGP prior on as suggested from previous graphical analysis. Note also that for 
posterior computation, we first scale the predictor space to (0, 1], leading to 5i = 1/100, for 
i = 1,...,100. 

For inference in BCR we consider the same previous hyperparameters setting for and 
S priors as well as the same truncation levels K* and L* , while the length scale k in GP 
prior for £^ and ipk has been set to 10 using the data-driven heuristic outlined by Fox and 
Dunson (2011). In both cases we run 50,000 Gibbs iterations discarding the first 20,000 as 
burn-in and thinning the chain every 5 samples. Examination of trace plots of the elements 
of {£(ti)}i£i and {^(ti)}}^ showed no evidence against convergence. 

Figure [2] compares true and posterior mean (for both approaches) of jx{t) and £(t) over 
the predictor space T Q together with the point-wise 95% high posterior density intervals. 
From this plots we can clearly note that our approach is able to capture conditional het- 
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Covariance series 2 - 5 



Variance series 2 



Mean series 4 




Figure 2: Plots of truth (black) and posterior mean respectively of LBCR (solid red line) and BCR (solid 
green line) for selected components of the covariance (left), variance (middle), mean (right). For both 
approaches the dotted lines represent the 95% highest posterior density intervals. 

eroscedasticity as well as mean patterns, also in correspondence of dramatic changes in the 
time- varying true functions. The major differences compared to the true values can be found 
at the beginning and at the end of the series and are likely to be related to the structure of 
the simulation smoother which also causes a widening of the credibility bands at the very 
end of the series, for references regarding this issue see Durbin and Koopman (2001). How- 
ever, even in the most problematic cases, the true values are within the bands of the 95% 
highest posterior density intervals. Much more problematic is the behavior of the posterior 
distributions for BCR which badly over-smooth both covariance and mean functions leading 
also to many 95% highest posterior density (hpd) intervals not containing the true values. 
The comparison of the summaries of the squared errors between true values {^(ti)}i=x and 
{£(**) and posterior mean {S(^) }J™ and {//(*») }i™ respectively for BCR and LBCR in 
Table [TJ once again confirms the overall better performance of our approach. 

To better understand the improvement of our approach in allowing locally varying 
smoothness and to evaluate the consequences of the over-smoothing induced by BCR on 
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Mean 




Covariance {£(£;)} 




BCR 


LBCR 


BCR 


LBCR 


Mean 


2.24 


1.64 


1351.09 


667.27 


90th Quantile 


4.86 


3.22 


1608.45 


1556.38 


95th Quantile 


9.39 


4.98 


5140.102 


3460.47 


Max 


74.57 


67.87 


96092.35 


22373.37 



Table 1: Summaries of the squared errors between true values {fJ,(ti)}}^{ and {E^)}^ and posterior 
mean {E(i i )} J 1 £° 1 and obtained respectively with BCR and our LBCR. 

the distribution of y^ with i = 1,...,100 consider Figure [3] which shows, for some selected 
series {yji}}^\, the time varying mean together with the point- wise 2.5% and 97.5% quantiles 
of the marginal distribution of yji induced respectively by the true mean and true variance, 
the posterior mean of Hj(ti) and T,jj(ti) from our proposed approach and the posterior mean 
of the same quantities from the competing alternative. We can clearly see that the marginal 
distribution of yji induced by BCR is over-concentrated near the mean leading to incorrect 
inferences. Note that our proposal is also able to accommodate heavy tails, a typical char- 
acteristic in financial series. 

4.2 Online Updating Performance 

To analyze the performance of the online updating algorithm, we simulate 50 new observa- 
tions {yi}l= W i with t,i £ T* = {101, 150}, considering the same 6 and S used in the 
generating mechanism for the previous simulated data and taking the 50 subsequent obser- 
vations of the bumps functions for the dictionary elements {£(^)}i=ioi; finally the additional 
latent mean dictionary elements {i()(ti)}}= l0l are simulated as before maintaining the continu- 
ity with the previously simulated functions {^(ti)}}^. According to the algorithm described 
in Section 3.2, we fix 6, S , Wl lk }, {°~A lk }, { a ^ k } an d {°~ 2 B k } a ^ their posterior mean from 
the previous Gibbs sampler and consider the last three observations j/gs, ygg and yioo (i-e. 
k — 3) to initialize the simulation smoother in i = 101 through the proposed data-driven 
initialization approach. Posterior computation shows good performance in terms of mixing, 
and convergence is assessed after 5,000 Gibbs iterations with a small burn-in of 500. 

Figure [4] compares true mean and covariance to posterior mean of a select set of com- 
ponents of {M^)};=ioi an d {^(^)}i=ioi> including also the 95% hpd intervals. The results 
clearly show that the online updating is characterized by a good performance which allows 
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Figure 3: Plot for 3 selected simulated series of the time- varying mean /ij(ii) and the time- varying 2.5% and 
97.5% quantiles of the marginal distribution of yji with true mean and variance (black), mean and variance 
from posterior mean of LBCR (red), mean and variance from posterior mean of BCR (green). Black points 
represent the simulated data. 

to capture the behavior of new observations conditioning on the previous estimates. Note 
that the posterior distribution of the approximated mean and covariance functions tends 
to slightly over-estimate the patterns of the functions at dramatic changes, however also in 
these cases the true values are within the bands of the credibility intervals. Finally note that 
the data-driven initialization ensures a good behavior at the beginning of the series, while 
the results at the end have wide uncertainty bands as expected. 

5. Application 

Spurred by the recent growth of interest in the dynamic dependence structure between fi- 
nancial markets in different countries, and in its features during the crisis that have followed 
in recent years, we applied our LBCR to the multivariate time series of the main national 
stock market indices. 

5.1 National Stock Indices (NSI), Introduction and Motivation 

National Stock Indices represent technical tools that allow, through the synthesis of numerous 
data on the evolution of the various stocks, to detect underlying trends in the financial 
market, with reference to a specific basis of currency and time. More specifically, each 
Market Index can be defined as a weighted sum of the values of a set of national stocks, 
whose weighting factors is equal to the ratio of its market capitalization in a specific date 
and overall of the whole set on the same date. 
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Figure 4: Plots of truth (black) and posterior mean of the online updating procedure (solid red line) for 
selected components of the covariance (left), variance (middle), mean (right). The dotted lines represent the 
95% highest posterior density intervals. 

In this application we focus our attention on the multivariate weekly time series of the 
main 33 (i.e. p = 33) National Stock Indices from 12/07/2004 to 25/06/2012 (T=416 weeks). 
Figure [5] shows the main features in terms of stationarity, mean patterns and volatility of two 
selected NSI downloaded from http :/ /finance, yahoo. com/\ The non-stationary behavior, to- 
gether with the different bases of currency and time, motivate the use of logarithmic returns 
Uji = log(ijj//jj_i), where Iji is the value of the National Stock Index j at i. Beside this, 
the marginal distribution of log returns shows heavy tails and irregular cyclical trends in the 
nonparametric estimation of the mean, while EWMA estimates highlight rapid changes of 
volatility during the financial crisis observed in the recent years. All these results, together 
with large settings and high frequency data typical in financial fields, motivate the use of 
our approach to obtain a better characterization of the time-varying dependence structure 
among financial markets. 
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Figure 5: Plots of the main features of USA NASDAQ (top) and ITALY FTSE MIB (bottom). Specifically: 
observed time series (left), log-returns series (black) with nonparametric mean estimation via 12 week Equally 
Weighted Moving Average (red) in the middle, EWMA volatility estimates (right). 

5.2 LBCR for National Stock Index (NSI) 

We consider the heteroscedastic model jji ~ A^ 33 (/x(tj), E(tj)) for i — 1, ...,415 and tj in the 
discrete set T Q = {1, 2, 415}, where mean and covariance matrix E(tj) of the National 
Stock Indices at time t = ti are given in (|4]) and ([2]), respectively. 

Posterior computation is performed by first rescaling the predictor space T to (0, 1] and 
using the same setting of the simulation study, with the exception of the truncation levels 
fixed at K* = 4 and L* = 5 and the hyperparameters of the nGP prior for each £^ and ipk with 
I — 1, L* and k — 1, K*, set to = a a = = a# = 2 and = bA = b^, = bs = 5 x 10 7 
to capture also rapid changes in the mean functions according to Figure [5] Missing values 
in our dataset do not represent a limitation since the Bayesian approach and the use of 
stochastic processes in continuous time allow us to update our posterior considering solely 
the observed data. We run 10,000 Gibbs iterations with a burn-in of 2,500. Examination of 
trace plots for and {/i(tj)}^ showed no evidence against convergence. 

Posterior distributions for the variances in Figure [6] demonstrate that we are clearly 
able to capture the rapid changes in the dynamics of volatility that occur during the world 



17 



USA NASDAQ USA NASDAQ Volatility: Posterior Estimates 




2004-07-19 2006-02-18 2007-09-21 2009-04-22 2010-11-23 2012-06-25 2004-07-19 2006-02-18 2007-09-21 2009-04-22 2010-11-23 2012-06-25 



ITALY FSTE MIB ITALY FSTE MIB Volatility: Posterior Estimates 




2004-07-19 2006-02-18 2007-09-21 2009-04-22 2010-11-23 2012-06-25 2004-07-19 2006-02-18 2007-09-21 2009-04-22 2010-11-23 2012-06-25 



Figure 6: Left: Plot for 2 NSI, respectively USA NASDAQ (top) and ITALY FTSE MIB (bottom), of the 
log returns (black) and the time-varying estimated mean {Aj(^i)}fii together with the time-varying 2.5% 
and 97.5% quantilcs (red) of the marginal distribution of yji from LBCR. Right: posterior mean (black) and 
95% hpd (dotted red) for the variances {T,jj{ti)}fl^. 

financial crisis of 2008, in early 2010 with the Greek debt crisis and in the summer of 2011 
with the financial speculation in government bonds of European countries together with the 
rejection of the U.S. budget and the downgrading of the United States rating. Moreover, the 
resulting marginal distribution of the log returns induced by the posterior mean of Hj(t) and 
Ejj(i), shows that we are also able to accommodate heavy tails as well as cyclical trends for 
the means. Important information about the ability of our model to capture the evolution 
of world geo-economic structure during different finance scenarios are provided in Figures [7] 
and|8} From the correlations between NASDAQ and the other National Stock Indices (based 
on the posterior mean {E(£t)}*=i of the covariances function) in Figure]^ we can immediately 
notice the presence of a clear geo-economic structure in world financial markets, where the 
dependence between the U.S. and European countries is systematically higher than that 
of South East Asian Nations (Economic Tigers), showing also different reactions to crises. 
Plots at the top of the Figure [8] confirms the above considerations showing how Western 
countries exhibit more connection with countries closer in terms of geographical, political 
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Figure 7: Black line: For USA NASDAQ median of correlations with the other 32 NSI based on posterior 
mean of Red lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles of correlations 

between USA NASDAQ and European countries (without considering Greece and Russia which present a 
specific pattern). Green lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles of correlations between 
USA NASDAQ and the countries of Southeast Asia (Asian Tigers and India). The timeline is divided in 
windows that relate to the main financial events of the recent years, specifically event A corresponds to the 
burst of U.S. housing bubble, event B to the concrete risk of failure of the first U.S. credit agencies (Bear 
Stearns, Fannie Mae and Freddie Mac), event C to the world financial crisis after the Lehman Brothers' 
bankruptcy, event D to the Greek debt crisis, event E to financial reform launched by Barack Obama and 
EU efforts to save Greece (the two peaks represent respectively Irish debt crisis and Portugal debt crisis), 
event F to the worsening of European sovereign-debt crisis and the rejection of the U.S. budget, finally G to 
the crisis of credit institutions in Spain and the growing financial instability of the Eurozone. 

and economic structure; the same holds for Eastern countries where we observe a reversal of 
the colored curves. As expected, Russia is placed in a middle path between the two blocks. 
A further element that our model captures about the structure of the markets is shown in 
the plots at the bottom of Figure [8j The time- varying regression coefficients obtained from 
the standard formulas of the conditional normal distribution based on the posterior mean of 
{M*i)}f=i an d {^(U)}t=i highlight clearly the increasing dependence of European countries 
with higher crisis in sovereign debt and Germany, which plays a central role in Eurozone as 
expected. 

The flexibility of the proposed approach and the possibility of accommodating varying 
smoothness in the trajectories over time, allow us to obtain a good characterization of the 
dynamic dependence structure according with the major theories on financial crisis. Figure [7] 
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Figure 8: Top: For 3 selected stock market indices, respectively GERMANY DAX30 (top), CHINA SSE 
Composite (middle) and RUSSIA RTSI Index (bottom), plot of the median of the correlation based on 
posterior mean of with the other 32 world stock indices (black), the European countries without 

considering Greece and Russia (red) and the Asian Tigers including India (green). Bottom: For 3 of the 
European countries more subject to sovereign debt crisis, respectively ITALY (left), SPAIN (middle) and 
GREECE (right), plot of 25th, 50th and 75th quantiles of the time- varying regression parameters based on 
posterior mean with the other countries (black) and Germany (red). 

shows how the change of regime in correlations occurs exactly in correspondence to the burst 
of the U.S. housing bubble (A), in the second half of 2006. Moreover we can immediately 
notice that the correlations among financial markets increase significantly during the crises, 
showing a clear international financial contagion effect in agreement with other theories on 
financial crisis (see, e.g., Baig and Goldfajn, 1999, Stijn and Forbes, 2009). As expected the 
persistence of high levels of correlation is evident during the global financial crisis between 
late-2008 and end-2009 (C), at the beginning of which our approach also captures a dramatic 
change in the correlations between the U.S. and Economic Tigers, which lead to levels close 
to those of Europe. Further rapid changes are identified in correspondence of Greek crisis 
(D), the worsening of European sovereign- debt crisis and the rejection of the U.S. budget 
(F) and the recent crisis of credit institutions in Spain together with the growing financial 
instability Eurozone (G). Finally, even in the period of U.S. financial reform launched by 
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Barack Obama and EU efforts to save Greece (E), we can notice two peaks representing 
respectively Irish debt crisis and Portugal debt crisis. 

5.3 National Stock Indices, Updating and Predicting 

The possibility to quickly update the estimates and the predictions as soon as new data arrive, 
represents a crucial aspect to obtain quantitative information about the future scenarios of 
the crisis in financial markets. To answer this goal, we apply the online updating algorithm 
presented in Section 3.2, to the new set of weekly observations {yi}f^ 416 from 02/07/2012 to 
13/08/2012 conditioning on posterior estimates of the Gibbs sampler based on observations 
{|/i}|=i available up to 25/06/2012. We initialized the simulation smoother algorithm with 
the last 8 observations of the previous sample. 

Plots at the top of Figure [9] show, for 3 selected National Stock Indices, the new observed 
log returns {yji}^^ (black) together with the mean and the 2.5% and 97.5% quantiles of 
the marginal distribution (red) and conditional distribution (green) of yji\y^ J with y~ 3 = 
{Vqii Q 7^ j}- We use standard formulas of the multivariate normal distribution based on the 
posterior mean of the updated {£(tj)}^ 16 and {a*(^)}I^416 after 5,000 Gibbs iterations with 
a burn in of 500. This is sufficient for convergence based on examining trace plots of the 
time-varying mean and covariance matrices. From these results, we can clearly notice the 
good performance of our proposed online updating algorithm in obtaining a characterization 
for the distribution of new observations. Also note that the multivariate approach together 
with a flexible model for the mean and covariance, allow for significant improvements when 
the conditional distribution of an index given the others are analyzed. 

To obtain further information about the predictive performance of our LBCR, we can 
easily use our online updating algorithm to obtain h step-ahead predictions for E(t T+ / l | T ) and 
n(tT+h\T) with h = 1, H. In particular, referring to Durbin and Koopman (2001), we can 
generate the forecasts E(tT+h|T) and fx(tr+h\T) for h = 1, H merely by treating {yi}JJ^ +1 
as missing values in the proposed online updating algorithm. Here, we consider the one step 
ahead prediction (i.e. H = 1) problem for the new observations. More specifically, for each i 
from 415 to 421, we update the mean and covariance functions conditioning on information 
up to ti through the online algorithm and then obtain the predicted posterior distribution 
for S(t i+1 |j) and fi(t i+ i\i) by adding to the sample considered for the online updating a last 
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Figure 9: Top: For 3 selected NSI, respectively USA NASDAQ (left), INDIA BSE30 (middle) and FRANCE 
CAC40 (right), plot of the observed log returns (black) together with the mean and the 2.5% and 97.5% 
quantiles of the marginal distribution (red) and conditional distribution given the other 32 NSI (green) 
based on the posterior mean of {^(ii)}f=4i6 ano - {M(*i)}£ii6 fr° m the online updating procedure for the 
new observations from 02/07/2012 to 13/08/2012. Bottom: boxplots of the one step ahead prediction 
errors for the 33 NSI, where the predicted values are respectively: (a) unconditional mean {yj+i}|=4X5 = 0, 
(b) marginal mean of the one step ahead predictive distribution using the online updating procedure for 
{yi+i\i}il\i5> ( c ) conditional mean given the log returns of the other 32 NSI at i + 1 of the one step ahead 
predictive distribution using the online updating procedure for {yi+i|i}f=4ig- Predictions for (b) and (c) are 
induced by the posterior mean of {£(ij+i)}f=4ig and {ju(fj+i)}f=4i5 of our LBCR. 

column y i+ i of missing values. 

Plots at the bottom of Figure [9] show the boxplots of the one step ahead prediction 
errors for the 33 NSI obtained as the difference between the predicted value Vj t i+i\i and, once 
available, the observed log return Vj^+i with i + 1 = 416, 422 corresponding to weeks from 
02/07/2012 to 13/08/2012. In (a) we forecast the future log returns with the unconditional 
mean {yi + i}f^} Al5 = 0, which is what is often done in practice under the general assumption 
of zero mean, stationary log returns. In (b) we consider yi+%\i = £i(t i+ i\i), the posterior mean 
of the one step ahead predicted nonparametric mean, obtained from the previous proposed 
approach after 5,000 Gibbs iteration with a burn in of 500. Finally in (c) we suppose that 
the log returns of all National Stock Indices except that of country j (i.e., Uj,i+i) become 
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available at t i+1 and, considering y i+1 \i ~ N p (p,(t i+1 \^,Il(ti +1 \i)) with p,(t i+1 \i) and 
posterior mean of the one step ahead predictive distribution respectively for /i(t i+1 |j) and 
^(U+i\i)j we forecast Vj^+i with the conditional mean of t/j^+i given the other log returns at 
time ti+i. 

Comparing boxplots in (a) with those in (b) we can see that our model allows to obtain 
improvements also in terms of prediction. Furthermore, by analyzing the boxplots in (c) we 
can notice how our ability to obtain a good characterization of the time-varying covariance 
structure can play a crucial role also in improving forecasting, since it enters into the stan- 
dard formula for calculating the conditional mean in the normal distribution. 

6. Discussion 

In this paper, we have presented a generalization of Bayesian nonparametric covariance re- 
gression to obtain a better characterization for mean and covariance temporal dynamics. 
Maintaining simple conjugate posterior updates and tractable computations in large p set- 
tings, our model increases significantly the flexibility of previous approaches as it captures 
dramatic changes both in mean and covariance dynamics while accommodating heavy tails. 
Beside these key advantages, the state space formulation enables development of a fast online 
updating algorithm particularly useful for high frequency data. 

The application to the problem of capturing temporal and geo-economic structure be- 
tween the main financial markets demonstrates the utility of our approach and the improve- 
ments that can be obtained in the analysis of multivariate financial time series with reference 
to (i) heavy tails, (ii) cyclical trends in the mean structure, (iii) dramatic changes in mean 
and covariance functions, (iii) high dimensional dataset, (iv) online updating with high fre- 
quency data (v) missing values and (vi) predictions. Potentially further improvements are 
possible using a stochastic differential equation model that explicitly incorporates prior in- 
formation on dynamics. 

Appendix 

A.l Posterior Computation in Zero Mean Case 

For a fixed truncation level L* and a latent factor dimension K* the detailed steps of 
the Gibbs sampler for posterior computations are: 
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1. Define the vector of the latent state and the error terms in the state space equation 
resulting from nGP prior for dictionary elements as 



2, — £21(^1), ••, %L*K*(U), €ll(U)--> ^L*K*(U)i A U (ti), ••) A L * K *(ti)] T 

= [^i.Cn , ^i,6i > ■■T UJ i£ L * K * ' ^1,^11)^1,^21) ••) ^i,^i» K *] T 

Given 9, {^}^ =1 , {yi}J = i, S and the variances in latent state equations {c| ifc }, W% k }, 
with / = 1, L* and = 1, If*; update by using the simulation smoother 

in the following state space model 

Vi = [rj[ ® B,0p X (2xK*xL*)\z,i + ei (10) 
Hi + i = TjHj + Rifli^ (11) 



Where the observation equation in ( 10 ) results by applying the vec operator in the la- 
tent factor model yi = ®£(ti)r)i+ei. More specifically recalling the property vec(ABC) = 
{C T <S> A)vec(B) we obtain 

Ui = vec(yi) = vec{9£(ti)rii + ej 

= vec{Q^(ti)r]i} + vec(ei) 



The state equation in (11 ) is a joint representation of the equations resulting from the 
nGP prior on each £i k defined in ^7l). As a result, the (3 x L* x K*) x (3 x L* x K*) 
matrix Tj together with the (3 x L* x K*) x (2 x L* x If*) matrix Ri reproduce, for each 
dictionary element the state equation in ^ by fixing to the coefficients relating latent 
states with different (I, k) (from the independence between the dictionary elements). 
Finally, recalling the assumptions on u>i^ [k and ^i,A lk , is normally distributed with 
E[l\ e ] = and E[fi i4 fi^] = diag(<r| ll <y i , ^ K Ji, (T^i, a\ 2l 6 h a^^Ji). 

2. Given {Si}J =l sample each <j| ;fe and a\ [k respectively from 

a tJ{^} ~ ™vGa\at + -,k+ 2 }_^ ^ j 

2 ls ~ x ■ r ( T {A lk {t i+l ) - A lk {U)f \ 
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3. Conditioned on 0, {rji}J =1 , {yi}f =1 , and {^(^*) }^=i (obtained from Sj), the standard 
conjugate posterior from which to update aj 2 is 

^7 2 l ' M, {&J ~ + 1, b a + x - ^{yji - e^itM 2 ^ 

Where 9 r = [9 jU ...,9 jL *\ 

4. Given O, So, i/i, and £(U), the vector of latent factors at each time t can sampled from 
the Gaussian conditional distribution of r)i\Q, Eo, yt, £(U) 

n k . ((/ + e(t,) T e T s 1 ee(t J ))- 1 e(^) T e T s ^, (/ + e(^) T © T So ^mr 1 ) 

5. Given {^i}^!, {z/i}^=i, {£,(ti)}T=i an d the hyperparameters and r the shrinkage prior 
on combined with the likelihood for the latent factor model lead to the Gaussian 
posterior 





( 


" vn ' 


\ 


Oj.\{vih {yi},{t(ti)}A,r~N L * 


t e faj 2 




, Ee 




V 




/ 



where r] T = [£(£i)?7i, ^(^2)^2, f(*r)»?r] and 

E e 1 = crJ 2 fj T fj + diagtyjin, (pj L *T L *) 



6. The Gamma prior on the local shrinkage hyperparameter 0^ implies the standard 
conjugate posterior given 9ji and 77 



4>ji\9ji,n ~ Ga 2, 



3 + 



7. Conditioned on and r, sample the global shrinkage hyperparameters from 



Where r^ h) = UU^t for h = 1, 
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A. 2 Incorporating unknown mean 

The Gibbs sampler for the more general model follows exactly the steps previously outlined 
except for the step 4 which is replaced by a block sampling of {ip(ti)}J =1 and {vi\J =l . In 
particular 

4.1. Similarly to Hj and let 

% = [ii>i{ti),Mu), $k*(u)Mu), ^k* (**)> B i(ti)> B K *(ti)] T 

be the vectors of the latent state and error terms in the state space equation resulting 
from nGP prior for ip. 

Conditional on 6, {C(ti)}f=ii {Vi}T=i, ^o, and the variances in latent state equations 
{°"^fc}' { a B k }i w hh k — 1, ...,K*; sample {^i}J =1 from the simulation smoother in the 
following state space model 

Vi = [Q£(ti),0px(2xK<)]Vi + ™i, (12) 

= G^ + FA,*, (13) 



Wi ~ iV(0, 6^(tj)^(ti) T 6 T + S ). The observation equation in (|12j) results by marginal- 



izing out Vi in the latent factor model with nonparametric mean regression yi 



®£(ti)ip(U) + Q^(ti)Pi + 6j. Analogously to Hj, the state equation in (13) is a joint 
representation of the state equation induced by the nGP prior on each ip^ defined in 
@; where the (3 x K*) x (3 x K*) matrix d and the (3 x K*) x (2 x K*) matrix F t 
are constructed with the same goal of the matrices T{ and Ri in the state space model 
for Hi. Finally, ~ N 2x k*(0, diag^^, o^Si, cr^^, 0%^, a% 2 5i, o"!^^)). 

4.2. Given {^i}^ update each cx| fe and <r^ fc respectively from 

<^J{*<} ~ mvGa I + 2", ^ + - 2^ £ J 

2 irvr, l ■ n ( < T u , 1^ (B k (t i+1 ) - B k (ti)) 2 \ 
^bJW ~ wuGo I a B + 6b + g 2^ ^ J 

4.3. Conditioned on 9, E , Hi, £(U) and "0(ti), and recalling ~ Nk*(0, Ik*)', the standard 
conjugate posterior distribution i^G, £o,2/i, VK^i) is 

^ ((/ + ^) T e T £ 1 ee(t i ))- 1 e(^) T e T s 1 (/ + £(t,f e r s ^(t,))- 1, 
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with y i = yi - Q£(ti)ip(ti) = Q^(ti)ui + a 



B Online Updating Algorithm 

Consider 6, E , {cf ifc }, { a A lk }> {°"| fc } an d { a B k } fixed at their posterior mean 9, S , {<5"| fc }, 
{^% k }y {^| fe }' i^B k } respectively, and let E T , S St and ^t, be the sample mean and 
covariance matrix of the posterior distribution respectively for E T and obtained from the 
posterior estimates of the Gibbs sampler conditioned on {yi}J =1 - 

1. Given 9, t , {<t|J, {&% k }, {Vi}J=r+i and {Vi}J=T+i update {Ei}J+^ +1 by using the 
simulation smoother in the following state space model 

Vi = [rjf <g> 9,0 px (2x^*xL*)]2 i + €i 

Where S^+i can be initialized from the standard one step ahead predictive distribution 
for the state space model E T+1 ~ N(T T E T ,T T t ST T^ + R T E[n T ,&T,z\ R T) 

2. Conditioned on 9, S , {<t|J, {^)}f=T+i and { yi }J^ +1 sample {^}f = + T ^ 
through the simulation smoother in the state space model 

Hi = l<S>£(ti),0 px (2xK*)]^i + Wi 

Similarly to S T+1 , ^ T+1 ~ iV(G T ^ T , G^E^G^ + F T E[Q T ^JF^ ) 

3. Given 9, S , {yi}, an d V'(^)) for i = T + 1, ...T + H, sample i/j from the standard 
conjugate posterior distribution for z/j|9, S , jji, £(ti), ipiti): 

n k * ((/ + ^) T e T s - 'emr'auf^o x % v + e T £ 'mi))- 1 ) 

with y i = yi - 0£(ti)ip(ti) = Q£{U)vi + ej. 

4. Compute the updated covariance {^{U)}J=t+i an d m ean from the usual 
equations 

e(u) = eau)au) T e T + z 
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